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the  classical  development  of  statistical  theory  and  methodology.  The 
analysis  of  cross-classified  categorical  data,  or  contingency  table 
analysis  as  it  is  often  referred  to,  represents  the  discrete  multivariate 
analogue  of  analysis  of  variance  for  continuous  response  variables,  and  now 
plays  an  important  role  in  statistical  practice.  This  presentation  is 
intended  as  an  introduction  to  some  of  the  more  widely  used  techniques  for 
the  analysis  of  contingency  table  data,  and  to  the  statistical  theory  that 
underlies  them. 

The  term  contingency,  used  in  connection  with  tables  of  cross- 

classified  categorical ‘data  seems  to  have  originated  with  Karl  Pearson 

[1904],  who  for  an  sxt-fold  cable  defined  contingency  to  be  any  measure  of 

the  total  deviation  from  "independent  probability"  The  term  is  now  used 

to  refer  to  the  table  of  counts  itself.  Prior  to  this  formal  use  of  the 

term,  statisticians  going  back  at  least  to  Quetelet  [1849] ,  worked 

with  cross-classifications  of  counts  to  summarize  the  association 

between  variables.  Pearson  [1900a]  has  laid  the  groundwork  for  his 

2 

approach  to  contingency  tables,  when  he  developed  his  x  test  for 
comparing  observed  and  expected  (theoretical)  frequencies.  Yet  Pearson 
preferred  to  view  contingency  tables  involving  the  cross-classification  of 
two  or  more  polytomies  as  arising  from  a  partition  of  a  set  of  multivariate, 
normal  data,  with  an  underlying  continuum  for  each  polytomy.  This  view 
led  Pearson  [1900b]  to  develop  his  tetrachoric  correlation  coefficient  for 
2^2  tables,  and  this  work  in  turn  spawned  an  extensive  literature  well 
chronicled  by  Lancaster  [1969]. 
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The  most  serious  problems  with  Pearson's  approach  were  (1)  the  com¬ 
plicated  infinite  series  linking  the  tetrachoric  correlation  coefficient 
with  the  frequencies  in  a  2x2  table  and  (2)  his  insistence  that  it  always 
made  sense  to  assume  an  underlying  continuum,  even  when  the  dichotomy  of 
interest  was  dead-alive  or  employed-unemployed,  and  that  it  was  reasonable 
to  assume  that  the  probability  distribution  over  such  a  continuum  was 
normal.  In  contradistinction.  Yule  [1900]  chose  to  view  the  categories  of 
a  cross-classification  as  fixed,  and  he  set  out  to  consider  the  structural 
relationship  between  or  among  the  discrete  variables  represented  by  the 
cross-classification,  via  various  functions  of  the  cross-product  ratio. 
Especially  impressive  in  this.  Yule's  first  paper  on  the  topic,  is  his 
notational  structure  for  n  attributes  or  2n  tables,  and  his  attention  to 
the  concept  of  partial  and  joint  association  of  dichotomous  variables. 

The  debate  between  Pearson  and  Yule  over  whose  approach  was  more 
appropriate  for  contingency  table  analysis  raged  for  many  years  (see  e.g., 
Pearson  and  Heron  [1913]),  and  Che  acrimony  it  engendered  was  exceeded 
only  by  that  associated  with  Pearson’s  dispute  with  R.A.  Fisher  over  the 
adjustment  in  the  degrees  of  freedom  (d.f.)  for  the  chi-square  test  of 
independence  in  the  sxt-fold  table.  (In  this  latter  case  Pearson  was 
simply  incorrect;  as  Fisher  [1922]  first  noted,  d.f.  *  (s-l)(t-l)  ) 

While  much  work  on  two-dimensional  contingency  tables  following  the 
pioneering  efforts  by  Pearson  and  Yule,  it  was  not  until  1935  that  Bartlett, 
as  a  result  of  a  suggestion  by  Fisher,  utilized  Yule's  cross-product  ratio 
to  define  the  notion  of  second-order  interaction  in  a  2x2x2  table,  and  to 
develop  an  appropriate  test  for  the  absence  of  such  an  interaction.  The 
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multivaraite  generalizations  of  Bartlett's  work,  beginning  with  the  work 
# 

of  Roy  and  Kastenbaum  [1956]  ,  form  the  basis  of  the  loglinear  model 
approach  to  contingency  tables,  which  is  described  in  detail  in  Section  3. 

The  past  25  years  has  seen  a  burgeoning  literature  on  the  analysis 
of  contingency  tables,  stemming  In  large  part  from  work  by  S.N.  Roy  and 
his  students  at  North  Carolina,  and  from  that  of  David  Cox  on  binary 
regression.  Some  of  this  literature  emphasizes  the  use  of  the  minimum 
modified  chi-square  approach  (e.g..  Grizzle,  Starmer,  and  Koch  [1969]), 
or  the  use  of  the  minimum  discrimination  information  approach  (e.g.,  Ku 
and  Kullback  [1968],  and  Gokhale  and  Kullback  [1978]),  while  the  bulk 
of  it  follows  Fisher  in  the  use  of  maximum  likelihood.  For  most  contin¬ 
gency  table  problems  the  minimum  discrimination  information  approach 
yields  maximum  likelihood  estimates. 

Except  for  a  few  attempts  at  the  use  of  additive  models  (see,  e.g., 
Bhapkar  and  Koch  [1968])  almost  all  the  papers  written  on  the  topic 
emphasize  the  use  of  loglinear  or  logistic  models.  Key  papers  by  Birch 
[1963],  Darroch  [1962],  Good  [1963],  and  Goodman  [1963,  1964]  plus  the 
availability  of  high-speed  computers,  served  to  spur  renewed  interest 
in  the  problems  of  categorical  data  analysis.  This  in  turn  led  to  many 
articles  by  Leo  Goodman  (e.g.,  Goodman  [1968,  1969,  1970])  and 
others,  and  finally  culminated  in  books  by  Bishop,  Fienberg  and  Holland 
[1975],  Cox  [1970],  Gokhale  and  Kullback  [1978],  Haberman  [1974],  and 
Plackett  [1974] ,  all  of  which  focus  in  large  part  on  the  use  of  loglinear 
models  for  both  two-dimensional  and  multidimensional  tables.  A  detailed 
bibliography  for  the  statistical  literature  on  contingency  tables  through 
1974  is  given  by  Killion  and  Zahn  [1976]. 
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The  subsequent  sections  of  this  presentation  are  concerned  primarily 
with  the  use  of  loglinear  models  for  the  analysis  of  contingency  table 
data.  For  details  on  some  related  methods  see  the  book  by  Lancaster  [1969], 
and  the  series  of  papers  on  measures  of  association  by  Goodman  and  Kruskal, 
which  have  been  recently  reprinted  as  Goodman  and  Kruskal  [1979].  Several 
book-length  but  elementary  presentations  on  loglinear  models  are  now 
available,  including  Everitt  [1977],  Fienberg  [1980],  Haberman  [1978,  1979],  and 
Upton  [1978]. 

The  next  section  describes  two  examples  which  will  serve  to  illustrate 
some  of  the  methods  of  analysis.  Then,  Section  3  briefly 
discusses  some  alternative  methods  for  estimation  of  parameters  used  in 
conjunction  with  categorical  data  analysis,  and  Section  4  outlines  the 
basic  statistical  theory  associated  with  maximum  likelihood  estimation  and 
loglinear  models.  These  theoretical  results  are  then  illustrated,  in 
Section  3  on  the  examples  of  Section  2.  The  final  section  concludes  with 
a  guide  to  (a)  some  recent  applications  of  loglinear  and  contingency 
table  modelling,  and  Cb)  computer  programs  for  contingency  table  analysis. 
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2.  Two  Classic  Examples 

The  data  reported  by  Bartlett  [1935]  in  his  pioneering  article,  and  included 
here  in  Table  1,  are  from  an  experiment  giving  the  response  (alive  or  dead)  of  240 
plants  for  each  combination  of  the  two  explanatory  variables,  time  of  planting 
(early  or  late)  and  length  of  cutting  (high  or  low). 

Table  1:  2*2*2  table  of  Bartlett  [1935] 


2.  Time  of  Planting 

Early 

Late 

3.  Length  of  Cutting 

High 

Low 

High 

Low 

,  _  Alive 

156 

107 

84 

31 

1.  Response  Dfiad 

84 

133 

156 

209 

Total 

240 

240 

240 

240 

The  questions  to  be  answered  are:  (i)  What  are  the  effects  of  time  of 
planting  and  length  of  cutting  on  survival?  (ii)  Do  they  interact  in 
their  effect  on  survival? 

The  data  in  Table  2,  from  Waite  [1915],  give  the  cross-classification 
or  right-hand  fingerprints  according  to  the  number  of  whorls  and  small 
loops.  The  total  number  of  whorls  and  small  loops  is  at  most  5,  and  the 
resulting  table  is  triangular: 


Table  2:  Fingerprints  of  the  right  hand  classified  by  the  number  of 
whorls  and  small  loops  (Waite  [1915]) 


Whorls 

0 

1 

Small 

2 

loops 

3 

4 

5 

Total 

0 

78 

144 

204 

211 

179 

45 

861 

1 

106 

153 

126 

80 

32 

1 

497 

2 

130 

92 

55 

15 

292 

3 

125 

38 

7 

| 

170 

4 

104 

26 

130 

5 

j 

50 

i 

Total 

593 

453 

392 

306 

211 

45 

2000 
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Here  Che  question  of  interest  is  more  complicated  because,  as  a  result  of 
the  constraint  forcing  the  data  into  the  triangular  structure,  the  number 
of  whorls  is  "related  to"  the  number  of  small  loops.  Such  an  array  of 
counts  is  referred  to  as  an  incomplete  contingency  table,  and  the  incomplete 
structure,  in  the  case  of  the  Waite  data,  was  the  source  of  yet  another 
controversy  involving  Karl  Pearson  [1930],  and,  this  time,  J.A.  Harris 
(see  Harris  and  Treloar  [1927]).  In  Section  5,  the  fit  of  a  relatively 
simple  model  to  these  data  is  explored. 
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3.  Estimating  Parameters  in  Contingency  Table  Models 

Let  x'  -  (x^.x^, . . . ,xt)  be  a  vector  of  observed  counts  for  t  cells, 
structured  in  the  form  of  a  cross-classification  such  as  in  Tables  1  and  2, 

3 

where  t  *  2  ■  8  and  t  •  21,  respectively.  Now  let  m'  -  (m^.m^, . • . ,mt) 

be  the  vector  of  expected  values  that  are  assumed  to  be  functions  of 
unknown  parameters  9'  -  (9lt92 , . . . ,9g) ,  where  s  <  t.  Thus,  one  can  write 
m  *  m(9) . 

There  are  three  standard  sampling  models  for  the  observed  counts  in 
contingency  tables: 

(i)  Poisson  model.  The  {x^}  are  observations  from  independent  Poisson 
random  variables  with  means  {m^}  and  likelihood  function: 

t  x 

H  (m  exp(-m  )/x  !).  (1) 

i-1  1  1  1 

t 


(ii)  Multinomial  model.  The  total  count  N  -  Z  x.  is  a  random 

i-1  1 

sample  from  an  infinite  population  where  the  underlying  cell  probabilities 
are  {n^/N},  and  the  likelihood  is 


.  -N  C  xi 
N!  *N  IT  (m.  Vx,!). 

i-1  1  1 


(2) 


(iii)  Product-Multinomial  model.  The  cells  are  partitioned  into 
secs,  and  each  set  has  an  independent  multinomial  structure,  as  in  (ii). 


For  the  Bartlett  data  in  Section  2,  the  sampling  model  is  product-multi¬ 
nomial  —  there  are  actually  4  independent  binomials,  one  for  each  of  the  4 
experimental  conditions  corresponding  to  the  two  factors  time  of  planting  and 
length  of  cutting.  For  the  fingerprint  data,  the  sampling  model  is  multinomial. 

(See  the  discussion  of  factors  and  responses  in  the  entry,  Categorical 
•  '  — 1 — — —  — 

Data,  by  Upton.) 

For  each  of  these  sampling  models  the  estimation  problem  can  typically 
be  structured  in  terms  of  a  "distance"  function,  K(x,m),  where  parameter 
estimates  9  are  chosen  so  that  the  distance  between  x  and  m  -  m(9),  as 
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measured  by  K(x,m) ,  is  minimized.  The  minimum  chi-square  method  uses  the 
distance  function, 


2  2 
X  (x,m)  -  E  (x  -m  )  /m  , 

i-1 

the  minimum  modified  chi-square  method  uses  the  function 


(3) 


Y2(x,m)  *  Z  (x.-m  )2/x  , 
i-1  1  1 

and  the  minimum  discrimination  information  method  uses  either 
2  C 

G  (x,m)  -  2  Z  x.  log  (x  /m  ), 

-  -  i-i  1  1  x 


(4) 


(5) 


or 

2  c 

G  (m,x)  -2  E  m  log  (m  /x  ) .  (6) 

i-1  1  11 

Rao  [1962]  studies  these  and  other  choices  of  "distance"  functions. 

For  the  three  basic  sampling  models  for  contingency  tables,  choosing 
~  2 

6  to  minimize  G  (x,m)  in  (S)  is  equivalent  to  maximizing  the  likelihood 
function  provided  that 

t  A  t 

E  m  (§)  -  Ex,  (7) 

i-1  1  "  i-1  1 

(and  that  constraints  similar  to  (7)  hold  for  each  of  the  set  of  cells 
under  product-multinomial  sampling,  (iii)).  Moreover,  the  estimators 
that  minimize  each  of  (3),  (4),  (5),  and  (6)  in  such  circumstances 
belong  to  the  class  of  Best  Asymptotic  Normal  (BAN)  estimates  for  m  (see 
Bishop,  Fienberg,  and  Holland  [1975]  and  Neyman  [1949]  for  further  dis¬ 
cussion  of  asymptotic  equivalence) .  Because  of  various  additional  asymptotic 
properties,  and  because  of  the  smoothness  of  maximum  likelihood  estimates 
in  relatively  sparse  tables, many  authors  have  preferred  to  work  with  maximum 
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likelihood  estimates  (MLE's),  which  minimize  (5). 

4.  Some  Basic  Theory  for  Loglinear  Models 

For  expected  values  {m^}  for  a  2x2  table; 


1 

A 

2 


B 

1  2 


mll 

”12 

m21 

m22 

a  standard  measure  of  association  for  the  row  and  column  variables,  A  and 
B,  respectively,  is  the  cross-product  ratio  proposed  by  Yule  [1900]: 


mllm22 

m12m21 


(8) 


(for  a  discussion  of  the  properties  of  a,  see  Bishop,  Fienberg  and  Holland 
[1975]  or  Fienberg  [1980]).  Independence  of  A  and  B  is  equivalent  to 
setting  a  -  1,  and  can  also  be  expressed  in  loglinear  form: 


log  m 


ij 


u  +  u 


l(i)  +  U2(j)’ 


(9) 


where 


2  2 

Jj.  U1U)  “  jfj.  U2  (j )  "  °* 

Note  that  the  choice  of  notation  here  parallels  that  for  analysis  of 
variance  models.  (See  the  entry,  Categorical  Data,  by  Upton  for  a  re¬ 
lated  discussion,  using  somewhat  different  notation.) 


(10) 
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Bartlett’s  [1935]  no-second-order  interaction  model  for  the  expected 
values  in  a  2*2*2  table 


mU2 

m122 

m212 

m222 

mlll 

m121 

m211 

m221 

is  based  on  equating  the  values  of  ct  in  each  layer  of  the  table,  i.e., 

mlllm221  _  m112m222 
m121m211  m122m212 

Expression  (11)  can  be  represented  in  loglinear  form  as 


1o*  miik 


u  +  ul(i)  +  “2(1)  *  “3(k)  +  “12(H)  +  “13 (lk) 


+  u23(lk)’ 


(12) 


where,  as  in  (10),  each  subscripted  u-term  sums  to  zero  over  any  subscript, 
e.g. , 


l  Ul2(ij)  "  J  U12(ij)  "  °*  (13) 

All  of  the  parameters  in  (12)  can  be  written  as  functions  of  cross- 
product  ratios  (see  Bishop,  Fienberg,  and  Holland  [1975]). 

For  the  sampling  schemes  described  in  Section  3,  the  minimal  sufficient 
statistics  (MSS's)  are  the  two-dimensional  marginal  totals, 

and  {*...}  (except  for  linearly  redundant  statistics  included  for  purposes  of 

*r  j  tC 

symmetry),  where  a  "+"  indicates  summation  over  the  corresponding  subscript. 
The  MLE's  of  the  {m.  under  model  (12)  must  satisfy  the  likelihood  equations 

ljk 


^  ^ ,  ...a. 
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m. . ,  ■ 

Xij  + 

i,3  - 

A 

mi+k  " 

xi+k 

i,k  » 

A 

V|k" 

x+jk 

j,k  - 

1,2, 

1.2,  (14) 

1.2, 


usually  solved  by  some  fora  of  iterative  procedure.  For  the  Bartlett 

data  the  third  set  of  equations  in  (14)  corresponds  to  the  binomial  sampling 

constraints. 

More  generally,  for  a  vector  of  expected  values  m,  if  the  log-expectations 
V  ■  (log  m^,...,log  mt)  are  representable  as  a  linear  combination  of  the 
parameters  9,  the  following  results  hold  under  the  Poisson  and  multinomial 
sampling  schemes  of  Section  3: 

(A)  Corresponding  to  each  parameter  in  0  is  a  MSS  that  is  expressible 
as  a  linear  combination  of  the  (x^.  (More  formally,  if  is  used  to 
denote  the  loglinear  model  specified  by  m  *  m(0) ,  then  the  MSS's  are  given 
by  the  projection  of  x  onto  Tty ,  P ^x.  For  a  more  detailed  discussion  see 
Haberman  [1974].) 

A 

(B)  The  MLE,  m,  of  m,  if  it  exists,  is  unique  and  satisfies  the 
likelihood  equations: 


m 


P^x. 


(15) 


(Note  that  the  equations  in  (14)  are  a  special  case  of  those  given  by 
expression  (15).) 

Necessary  and  sufficient  conditions  for  the  existence  of  a 
solution  to  the  likelihood  equations,  (15),  are  relatively  complex 

(see  Haberman  [1974]).  A  sufficient  condition  is  that  all  cell  counts 
be  positive,  i.e.,  x  >  0,  but  MLE's  for  loglinear  models  exist  in  many 
sparse  situations  where  a  large  fraction  of  the  cells  have  zero  counts. 
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For  product-multinomial  sampling  situations,  the  basic  multinomial 
constraints  (i.e.,  that  the  counts  must  add  up  to  the  multinomial  sample 
sizes)  must  be  taken  into  account.  Typically,  some  of  the  parameters  in 
9  which  specify  the  loglinear  model  i.e.,  m  -  m(8),  are  fixed  by  these 
constraints. 

More  formally,  let  Itf*  be  a  loglinear  model  for  m  under  product- 
multinomial  sampling  which  corresponds  to  a  loglinear  model  Thunder 
Poisson  sampling,  such  Chat  the  multinomial  constraints  "fix"  a  subset 
of  the  parameters,  9,  used  to  specify  7f] .  Then 

(C)  The  MLE  of  m  under  product-multinomial  sampling  for  the  model 
is  the  same  as  che  MLE  of  m  under  Poisson  sampling  for  the  model  7tJ . 

As  a  consequence  of  Result  C,  equations  (14-)  are  the  likelihood 
equations  for  the  2x2*2  table  under  the  no-second-order  interaction  model 
for  Poisson  or  multinomial  sampling,  as  well  as  for  product-multinomial 
sampling  when  any  set  of  one-way  or  two-way  marginal  totals  are  fixed  (i.e., 
these  correspond  to  the  multinomial  constraints) . 

A  final  result,  that  is  used  to  assess  the  fit  of  loglinear  models, 
can  be  stated  in  the  following  informal  manner: 

a 

(D)  If  m  is  the  MLE  of  m  under  a  loglinear  model,  and  if  the  model  is 
correct,  then  the  statistics 


X 


2 


Z  (x,-m, )2/m, 
i-1 


(16) 


and 


G 


2 


t 

2  I  x  log  (x  /m. ) 
i-1  1  11 


(17) 


have  asymptotic  x  distributions  with  t-s  degrees  of  freedom,  where  s  is 

the  total  number  of  independent  constraints  implied  by  the  loglinear  model 

and  the  multinomial  sampling  constraints  (if  any).  If  the  model  is  not 
2  2 

correct  then  X  and  G  ,  in  (16)  and  (17),  are  stochastically  larger  than 

2  . 

Xc_s-  (See  the  entry.  Chi-square  Teats,  by  Bhapkar  and  Koch.)  Expression 

(17)  is  the  minimizing  value  of  the  distance  function  (5),  but 
(16)  is  not  the  minimizing  chi-square  value  for  the  function  (3). 

In  the  next  section  these  basic  results  are  applied  in  the  context  of 
the  Bartlett  and  Waite  data  sets  of  Section  2. 

Many  authors  have  devised  techniques  for  selecting  among  the  class 
of  loglinear  models  applicable  fot  contingency  table  structure?.  These 
typically  (although  not  always)  resemble  corresponding  model  selection 
procedures  for  analysis  of  variance  and  regression  models.  See,  for 
example,  Goodman  [1971]  and  Aitken  [1978],  as  well  as  the  discussions  in 
Bishop,  Fienberg,  and  Holland  [1975],  and  Fienberg  [1980]. 


5.  Contingency  Table  Analyses 


5 . 1  Illustrative  Analyses 
3 

For  the  2  table  of  Bartlett  from  Section  2,  variables  2  and  3  are 
fixed  by  design,  so  that  *  240,  and  the  estimated  expected  values 

under  the  no  second-order  ie»v«raction  model  of  expression  (12)  are 
given  in  Table  3.  These  values  were  computed  by  Bishop,  Fienberg  and 
Holland  £l975]  using  the  method  of  iterative  proportional  fitting.  Bart¬ 
lett  originally  found  the  solution  to  equations  (14),  by  noting  that  the 
constraints  in  his  specification,  (11),  reduced  (14)  to  a  single  cubic 
equation  for  the  discrepancy  A  *  -  Xj^.  Note  that  the  expected 

values  satisfy  expression  (12),  e.g. ,  m^2+  =  78.9  +  36.1  *  115  *84+31 

2 

*  XJ2+*  goodness-of-fit  statistics  for  this  model  are  X  ■  2.27  and 

2 

G  *  2.29.  Using  Result  D  of  Section  4,  one  compares  these  values  to  tail- 

2 

values  of  the  chi-square  distribution  with  1  d.f.,  e.g.  *  2.71, 

and  this  suggests  that  the  no-second-order  interaction  model  provides  an 
acceptable  fit  to  the  data. 


Since  the  parameters  u,  {u^^}  and  ^u23(jjc)^  are  ^xed  by 

the  binomial  sampling  constraints  for  these  data,  model  (12)  is  often 
rewritten  as 


10,  £% 


2 1 U1 ( 1)  +  U12(13)  +  u13(2k)] 


W  +  W-  ...  +  w,,,. .  , 


(18) 
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Table  3:  Observed  and  Expected  Values  for  the  Bartlett  Data  Including  the 
No-Second  Order  Interaction  Model. 


Cell 

Observed 

X 

Estlmated^Expected 

m 

1,1,1 

156 

161.1 

2,1,1 

84 

78.9 

1,2,1 

84 

78.9 

2,2,1 

156 

161.1 

1,1,2 

107 

101.9 

2,1,2 

133 

138.1 

1,2,2 

31 

36.1 

2,2,2 

209 

203.9 

J  W2(j)  *  l  W3(k)  *  °* 

Expression  (18)  is  referred  to  as  a  logit  model  for  the  log-odds  for  alive 
versus  dead.  The  simple  additive  structure  corresponds  to  Bartlett's 
notion  of  no  second-order  interaction. 

For  the  Waite  fingerprint  data  of  Table  2,  one  model  that  has  been 
considered  is  the  simple  additive  loglinear  model  of  expression  (9) ,  but 
only  for  those  cells  where  positive  counts  are  possible,  i.e.,  in  the 
upper  triangular  section.  For  cells  with  i  >  j,  m^  *  0  a  priori.  This 
restricted  version  of  the  independence  model  is  referred  to  as  quasi¬ 
independence  .  and  the  results  of  the  preceding  section  can  be  used  in 

connection  with  it.  The  MSS's  are  still  the  row  and  coLumn  totals  (Result 
A).  The  likelihood  equations  under  multinomial  sampling  are  (applying 


Results  B  and  C) : 
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mi+  "  Xi+ 


Vj  "  *+j 


i  -  0,1,2, ... ,5 
j  ”  0,1,2,... ,5, 


(20) 


where  m^  »  0  for  1  >  j.  A  soluCion  of  equaClons  (20)  satisfying  the  model 
can  be  found  directly  (see  Goodman  [1968]  or  Bishop  and  Fienberg  [1969]), 
or  by  using  a  standard  iterative  procedure.  The  estimated  expected  values 
for  the  fingerprint  data  under  the  model  of  quasi- independence  are  given 
in  Table  4,  and  they  satisfy  the  marginal  constraints  in  expression  (20). 


Table  4.  Estimated  Expected  Values  for  Fingerprint  Data  Under  Quasi- 
Independence 


Whorls 

Small  loops 

0  1  2  3  4  5 

Total 

0 

200.6  167.4  166.6  150.3  131.1  45.0 

861 

1 

122.2  101.9  101.4  91.6  79.9 

497 

2 

85.5  71.4  71.0  64.1 

292 

3 

63.8  53.2  53.0 

170 

4 

70.9  59.1 

130 

5 

50.0 

50 

Total 

593  453  392  306  211  45 

2000 

2 

The  goodness-of-fit  statistics  for  this  model  are  X  •  399.8 
2 

and  G  ■  450.4  which  correspond  to  values  in  the  very  extreme  right-hand 
2 

tail  of  the  x^q  distribution.  Thus  the  model  of  quasi- independence  seems 
inappropriate.  Darroch  [1971]  describes  the  loglinear  model  of  F- 
independence  (with  more  parameters  than  the  quasi-independence  model), 
which  takes  in  account  the  way  in  which  the  constraint,  that  the  number 
of  small  loops  plus  the  number  of  whorls  cannot  exceed  5,  makes  the  usual 
definition  of  independence  inappropriate.  This  model  in  loglinear  form 
is 
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log  m. . 
*  JO 


u  +  u 


l(i)  *  u2(j)  T  ^3(5-i-5 > ■ 


(21) 


where  the  u^-parameters  correspond  to  diagonals  where  the  sum  of  the 
numbers  of  whorls  and  small  loops  is  constant.  Darroch  and  Ratcliff  [1973] 
illustrate  the  fit  of  the  F-independence  model  to  a  related  set  of  finger¬ 
print  data  involving  large  rather  than  small  loops. 


5.2  Multidimensional  Contingency  Table  Analysis 

Not  all  applications  of  loglinear  models  involve  such  simple  struc- 
3 

tures  as  2  tables,  or  even  incomplete  6x6  arrays.  Indeed,  much  of  the 
methodology  was  developed  in  the  mid-1960's  to  deal  with  very  large, 
highly  multidimensional  tables.  For  example,  in  the  National  Halothane 
Study  (.Bunker  et  al.  [1969]),  investigators  considered  data  on  the  use 
of  (i)  5  anesthetic  agents,  in  operations  involving  (ii)  4  levels  of  risk, 
and  patients  of  (iii)  2  sexes,  (iv)  10  age  groups,  with  (v)  7  differing 
physical  statuses  (levels  of  anesthetic  risk)  and  (vi)  previous  operations 
(yes,  no),  for  (vii)  3  different  years,  from  (viii)  34  different  insti¬ 
tutions.  Two  sets  of  data  were  collected,  the  first  consisting  of  all 
deaths  within  six  weeks  of  surgery,  and  the  second  consisting  of  a  sample 
(of  comparable  size)  of  all  chose  exposed  to  surgery.  Thus  the  data 
consisted  of  two  very  sparse  5*4x2x10x7x2x3x34  tables,  each  containing  in 
excess  of  57,000  cells.  One  of  the  more  successful  approach  used  in  the 
analysis  of  the  data  in  these  tables  was  based  on  loglinear  models  and 
Che  generalizations  of  the  methods  illustrated  in  this  section. 

One  of  Che  key  reasons  why  loglinear  models  have  become  so  popular 
in  such  analyses  is  Chat  they  lead  to  a  simplified  description  of  the 
data  in  terms  of  marginal  totals  —  the  minimal  sufficient  statistics 
of  Result  A  of  Section  4.  This  is  especially  important  when  the  table 


◦f  data  is  large  and  sparse.  For  more  derails  on  Che  HaloChane  Study 
analyses,  as  well  as  examples  of  ocher  applications  involving  four-way 
and  higher  dimensional  cables  of  counts,  see  Bishop,  Fienberg,  and  Holland 
[19751. 

A  second  reason  for  che  popularity  of  loglinear  models  relates  to 
their  interpretation.  A  large  subset  of  these  models  can  be  interpretted 
in  terms  of  independence  or  the  conditional  independence  of  several  discrete 
random  variables  given  the  values  of  other  discrete  variables,  thus 
generalizing  the  simple  ideas  for  2*2  tables  outlined  in  Section  4.  For 
further  details,  see  any  of  the  books  cited  in  Section  1. 
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6.  A  Brief  Guide  to  Additional  Application  and  Computing  Programs 
6. 1  Novel  Applications  Involving  Contingency  Tables 

Many  data  secs  can  profitably  be  structured  to  appear  in  the  form  of 
a  cross-classification  of  counts,  and  then  analyzed  using  methods  related 
to  those  described  in  this  entry.  Some  examples  of  applications  where 
this  has  been  done  include  the  following: 

(a)  Capture-multiple-recapture  analysis  to  estimate  the  size  of  a 
non-changing  population  (Fienberg  [1972] ,  Bishop,  Fienberg  and  Holland 
[1975]).  If  the  members  of  non-changing  populations  are  sampled  k  suc¬ 
cessive  times  (possibly  dependent),  then  the  resulting  recapture  history 

•  . 

data  can  be  displayed  in  the  form  of  a  2  table  with  one  missing  cell, 

corresponding  to  those  never  sampled.  Such  an  array  is  amenable  to  log- 
linear  analysis,  the  results  of  which  can  be  used  to  project  a  value  for 
the  missing  cell. 

(b)  Guttman  scaling  of  a  sequence  of  p  dichotomous  items  (Goodman 
[1975]).  The  items  form  a  perfect  Guttman  scale  if  they  have  an  order 

such  that  a  positive  resoonse  to  any  item  implies  a  positive  response 
to  those  items  lower  in  the  ordering.  Goodman  describes  an  application 
of  techniques  for  incomplete  multidimensional  contingency  tables  in  which 
he  measures  departures  from  perfect  Guttman  scales. 

(c)  Latent  structure  analysis,  where  unobservable  categorical 
variables  are  included  as  part  of  the  analysis  of  categorical  data  struc¬ 
tures,  and  the  observable  variables  are  taken  to  be  conditionally  inde¬ 
pendent  given  the  unobservable  latent  variables  (Goodman  [1974] ;  see  also 
the  entry,  Categorical  Data,  by  Upton) . 
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(d)  Paired  comparisons  of  several  objects  by  a  set  of  judges,  with 
the  outcome  being  the  preference  of  one  object  over  the  other.  A  well- 
known  model  for  paired  comparisons,  first  proposed  by  Bradley  and  Terry, 
and  several  extensions  to  it,  can  be  viewed  as  loglinear  models.  Then  rela¬ 
tively  standard  contingency  table  methods  can  be  used  to  analyze  pair 
comparisons  data  (see  Imrey,  Johnson,  and  Koch  [1976]  ,  Fienberg  and 
Larntz  [1976],  and  Fienberg  [1979]). 

6. 2  Computer  Programs  for  Loglinear  Model  Analysis 

As  with  other  forma  of  multivariate  analysis,  the  analysis  of  multi¬ 
dimensional  contingency  tables  relies  heavily  on  computer  programs.  A 
large  number  of  these  have  been  written  to  compute  estimated  parameter 
values  for  loglinear  models  and  associated  test  statistics,  and  most 
computer  installations  at  major  universities  have  one  or  more  programs 
available  for  users. 

The  most  widely  used  numerical  procedure  for  the  calculation  of 
maximum  likelihood  estimates  for  loglinear  models  is  the  method  of  itera¬ 
tive  proportional  fitting  (IPF),  which  iteratively  adjusts  the  entries  of 
a  contingency  table  to  have  marginal  totals  equal  to  those  used  in 
specifying  the  likelihood  equations.  Detailed  Fortran  listings  for  this 
method  are  available  in  Haberman  [1972,  1973],  and  they  have  been  imple¬ 
mented  in  the  BMDP  Programs  distributed  by  the  UCLA  Health  Sciences 
Computing  Facility  (Dixon  and  Brown  [1979]),  as  well  as  in  a  variety  of 
other  forms.  IPF  programs  also  exist  in  other  languages  such  as  APL 
(e.g.,  see  Fox  [1979]).  The  major  advantage  of  the  IPF  method  is  that 
it  requires  limited  computer  memory  capabilities  since  it  does  not  require 
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matrix  inversion  or  equivalent  computations,  and  thus  can  be  used  in 
connection  with  the  analysis  of  very  high  dimensional  tables.  Its  major 
disadvantage  is  that  it  does  not  provide,  in  an  easily  accessible  form, 
estimates  of  the  basic  loglinear  model  parameters  (and  an  estimate  of 
their  asymptotic  covariance  matrix) ;  it  only  provides  estimated  expected 
values. 

The  other  numerical  approaches  suggested  for  the  computation  of 
maximum  likelihood  estimates  are  typically  based  on  classical  procedures 
for  solving  nonlinear  equations  such  as  modifications  of  Newton's  method 
or  the  Newton-Raphson  method  (e.g.,  see  the  listing  in  Haberman  [1979]). 
Currently  the  most  widely  used  such  program  is  GLIM,  distributed  by  the 
Numerical  Algorithms  Group  of  the  United  Kingdom  (Baker  and  Nelder 
[1978]),  which  fits  a  class  of  generalized  linear  models,  of  which  log- 
linear  and  logit  models  are  special  cases.  The  virtue  of  these  programs 
is  that  they  produce  both  estimated  expected  values,  and  estimated  para¬ 
meter  values  and  an  estimate  of  the  asymptotic  covariance  matrix.  Unfor¬ 
tunately,  such  output  comes  at  the  expense  of  added  storage  and  these 
programs  cannot  handle  analyses  for  very  large  contingency  tables. 

Several  groups  of  researchers  are  currently  at  work  adapting  variants  of 
Newton's  method  using  numerical  techniques  chat  will  allow  for  increased 
storage  capacity,  and  thus  the  analysis  of  larger  tables  chan  is  currently 
possible. 

Computation  problems  remain  as  a  major  stumbling  block  to  the  wide¬ 
spread  application  of  loglinear  model  methods  to  the  analysis  of  large 
data  sets  structured  in  the  form  of  multidimensional  cross-classifications 


of  counts. 
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